Neural representations for modeling variation in speech
نویسندگان
چکیده
• Neural acoustic models can be used to automatically model pronunciation variation. Pronunciation variation is best captured by intermediate layers of transformer models. Transformer-based embeddings capture details not expressed phonetic transcriptions. Variation in speech often quantified comparing transcriptions the same utterance. However, manually transcribing time-consuming and error prone. As an alternative, therefore, we investigate extraction from several self-supervised neural We use these representations compute word-based differences between non-native native speakers English, Norwegian dialect speakers. For comparison with earlier studies, evaluate how well match human perception them available judgements similarity. show that extracted a specific type (i.e. Transformers) lead better than two approaches on basis MFCC-based features. furthermore find features generally one middle hidden final layer. also demonstrate only segmental differences, but intonational durational cannot adequately represented set discrete symbols
منابع مشابه
Modeling Pronunciation Variation for Speech Recognition
CERTIFICATE This is to certify that the work contained in this thesis titled Modeling Pronunciation Variation for Speech Recognition submitted by Gopala Krishna Anumanchipalli for the award of the degree of Master of Science (by Research) in Computer Science & Engineering is a bonafide record of research work carried out by him under our supervision. The contents of this thesis, in full or in p...
متن کاملModeling Pronunciation Variation for Cantonese Speech Recognition
Due to the large variability of pronunciation in spontaneous speech, pronunciation modeling becomes a more challenging and essential part in speech recognition. In this paper, we describe two different approaches of pronunciation modeling by using decision tree. At lexical level, a pronunciation variation dictionary is built to obtain alternative pronunciations for each word, in which each entr...
متن کاملSpeech Sound Perception and Neural Representations
This commentary reviews some of the main findings in speech sound perception using the brain imaging techniques and comments briefly on the recent findings by the session contributors. The main emphasis is on the experimental settings used in these studies. The aim is to demonstrate how the search for the neural correlates for abstract linguistic units has resulted in various types of experimen...
متن کاملModeling Pronunciation Variation in Automatic Speech Recognition
The performance of automatic speech recognition systems varies widely across different contexts. Very good performance can be achieved on single-speaker, large-vocabulary dictation in a clean acoustic environment, as well as on very small vocabulary tasks (such as digit recognition) with fewer constraints on the speakers and acoustic conditions. In other domains, such as meeting transcription o...
متن کاملModeling pronunciation variation using artificial neural networks for English spontaneous speech
Pronunciation variation in conversational speech has caused significant amount of word errors in large vocabulary automatic speech recognition. Rule-based approaches and decision-tree based approaches have been previously proposed to model pronunciation variation. In this paper, we report our work on modeling pronunciation variation using artificial neural networks (ANN). The results we achieve...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Phonetics
سال: 2022
ISSN: ['1095-8576', '0095-4470']
DOI: https://doi.org/10.1016/j.wocn.2022.101137